Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 75
Filtrar
1.
Psychol Methods ; 2023 Dec 25.
Artigo em Inglês | MEDLINE | ID: mdl-38147039

RESUMO

Self-report scales are widely used in psychology to compare means in latent constructs across groups, experimental conditions, or time points. However, for these comparisons to be meaningful and unbiased, the scales must demonstrate measurement invariance (MI) across compared time points or (experimental) groups. MI testing determines whether the latent constructs are measured equivalently across groups or time, which is essential for meaningful comparisons. We conducted a systematic review of 426 psychology articles with openly available data, to (a) examine common practices in conducting and reporting of MI testing, (b) assess whether we could reproduce the reported MI results, and (c) conduct MI tests for the comparisons that enabled sufficiently powerful MI testing. We identified 96 articles that contained a total of 929 comparisons. Results showed that only 4% of the 929 comparisons underwent MI testing, and the tests were generally poorly reported. None of the reported MI tests were reproducible, and only 26% of the 174 newly performed MI tests reached sufficient (scalar) invariance, with MI failing completely in 58% of tests. Exploratory analyses suggested that in nearly half of the comparisons where configural invariance was rejected, the number of factors differed between groups. These results indicate that MI tests are rarely conducted and poorly reported in psychological studies. We observed frequent violations of MI, suggesting that reported differences between (experimental) groups may not be solely attributed to group differences in the latent constructs. We offer recommendations aimed at improving reporting and computational reproducibility practices in psychology. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

2.
Behav Res Methods ; 2023 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-37950113

RESUMO

Preregistration has gained traction as one of the most promising solutions to improve the replicability of scientific effects. In this project, we compared 193 psychology studies that earned a Preregistration Challenge prize or preregistration badge to 193 related studies that were not preregistered. In contrast to our theoretical expectations and prior research, we did not find that preregistered studies had a lower proportion of positive results (Hypothesis 1), smaller effect sizes (Hypothesis 2), or fewer statistical errors (Hypothesis 3) than non-preregistered studies. Supporting our Hypotheses 4 and 5, we found that preregistered studies more often contained power analyses and typically had larger sample sizes than non-preregistered studies. Finally, concerns about the publishability and impact of preregistered studies seem unwarranted, as preregistered studies did not take longer to publish and scored better on several impact measures. Overall, our data indicate that preregistration has beneficial effects in the realm of statistical power and impact, but we did not find robust evidence that preregistration prevents p-hacking and HARKing (Hypothesizing After the Results are Known).

3.
R Soc Open Sci ; 10(8): 202326, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37593717

RESUMO

The COVID-19 outbreak has led to an exponential increase of publications and preprints about the virus, its causes, consequences, and possible cures. COVID-19 research has been conducted under high time pressure and has been subject to financial and societal interests. Doing research under such pressure may influence the scrutiny with which researchers perform and write up their studies. Either researchers become more diligent, because of the high-stakes nature of the research, or the time pressure may lead to cutting corners and lower quality output. In this study, we conducted a natural experiment to compare the prevalence of incorrectly reported statistics in a stratified random sample of COVID-19 preprints and a matched sample of non-COVID-19 preprints. Our results show that the overall prevalence of incorrectly reported statistics is 9-10%, but frequentist as well as Bayesian hypothesis tests show no difference in the number of statistical inconsistencies between COVID-19 and non-COVID-19 preprints. In conclusion, the literature suggests that COVID-19 research may on average have more methodological problems than non-COVID-19 research, but our results show that there is no difference in the statistical reporting quality.

4.
Behav Res Methods ; 2023 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-37540470

RESUMO

Outcome reporting bias (ORB) refers to the biasing effect caused by researchers selectively reporting outcomes within a study based on their statistical significance. ORB leads to inflated effect size estimates in meta-analysis if only the outcome with the largest effect size is reported due to ORB. We propose a new method (CORB) to correct for ORB that includes an estimate of the variability of the outcomes' effect size as a moderator in a meta-regression model. An estimate of the variability of the outcomes' effect size can be computed by assuming a correlation among the outcomes. Results of a Monte-Carlo simulation study showed that the effect size in meta-analyses may be severely overestimated without correcting for ORB. Estimates of CORB are close to the true effect size when overestimation caused by ORB is the largest. Applying the method to a meta-analysis on the effect of playing violent video games on aggression showed that the effect size estimate decreased when correcting for ORB. We recommend to routinely apply methods to correct for ORB in any meta-analysis. We provide annotated R code and functions to help researchers apply the CORB method.

5.
Educ Psychol Meas ; 83(4): 684-709, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37398839

RESUMO

When cognitive and educational tests are administered under time limits, tests may become speeded and this may affect the reliability and validity of the resulting test scores. Prior research has shown that time limits may create or enlarge gender gaps in cognitive and academic testing. On average, women complete fewer items than men when a test is administered with a strict time limit, whereas gender gaps are frequently reduced when time limits are relaxed. In this study, we propose that gender differences in test strategy might inflate gender gaps favoring men, and relate test strategy to stereotype threat effects under which women underperform due to the pressure of negative stereotypes about their performance. First, we applied a Bayesian two-dimensional item response theory (IRT) model to data obtained from two registered reports that investigated stereotype threat in mathematics, and estimated the latent correlation between underlying test strategy (here, completion factor, a proxy for working speed) and mathematics ability. Second, we tested the gender gap and assessed potential effects of stereotype threat on female test performance. We found a positive correlation between the completion factor and mathematics ability, such that more able participants dropped out later in the test. We did not observe a stereotype threat effect but found larger gender differences on the latent completion factor than on latent mathematical ability, suggesting that test strategies affect the gender gap in timed mathematics performance. We argue that if the effect of time limits on tests is not taken into account, this may lead to test unfairness and biased group comparisons, and urge researchers to consider these effects in either their analyses or study planning.

6.
Front Public Health ; 11: 1171851, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37415707

RESUMO

Background: Empirical evidence indicates that both HIV infection and stunting impede cognitive functions of school-going children. However, there is less evidence on how these two risk factors amplify each other's negative effects. This study aimed to examine the direct effects of stunting on cognitive outcomes and the extent to which stunting (partially) mediates the effects of HIV, age, and gender on cognitive outcomes. Methodology: We applied structural equation modelling to cross-sectional data from 328 children living with HIV and 260 children living without HIV aged 6-14 years from Nairobi, Kenya to test the mediating effect of stunting and predictive effects of HIV, age, and gender on cognitive latent variables flexibility, fluency, reasoning, and verbal memory. Results: The model predicting the cognitive outcomes fitted well (RMSEA = 0.041, CFI = 0.966, χ2 = 154.29, DF = 77, p < 0.001). Height-for-age (a continuous indicator of stunting) predicted fluency (ß = 0.14) and reasoning (ß = 0.16). HIV predicted height-for-age (ß = -0.24) and showed direct effects on reasoning (ß = -0.66), fluency (ß = -0.34), flexibility (ß = 0.26), and verbal memory (ß = -0.22), highlighting that the effect of HIV on cognitive variables was partly mediated by height-for-age. Conclusion: In this study, we found evidence that stunting partly explains the effects of HIV on cognitive outcomes. The model suggests there is urgency to develop targeted preventative and rehabilitative nutritional interventions for school children with HIV as part of a comprehensive set of interventions to improve cognitive functioning in this high-risk group of children. Being infected or having been born to a mother who is HIV positive poses a risk to normal child development.


Assuntos
Infecções por HIV , Feminino , Humanos , Criança , Infecções por HIV/epidemiologia , Infecções por HIV/complicações , Análise de Classes Latentes , Quênia/epidemiologia , Estudos Transversais , Transtornos do Crescimento/epidemiologia , Transtornos do Crescimento/etiologia , Cognição
7.
BMC Psychiatry ; 23(1): 373, 2023 05 29.
Artigo em Inglês | MEDLINE | ID: mdl-37248481

RESUMO

INTRODUCTION: Culturally validated neurocognitive measures for children in Low- and Middle-Income Countries are important in the timely and correct identification of neurocognitive impairments. Such measures can inform development of interventions for children exposed to additional vulnerabilities like HIV infection. The Battery for Neuropsychological Evaluation of Children (BENCI) is an openly available, computerized neuropsychological battery specifically developed to evaluate neurocognitive impairment. This study adapted the BENCI and evaluated its reliability and validity in Kenya. METHODOLOGY: The BENCI was adapted using translation and back-translation from Spanish to English. The psychometric properties were evaluated in a case-control study of 328 children (aged 6 - 14 years) living with HIV and 260 children not living with HIV in Kenya. We assessed reliability, factor structure, and measurement invariance with respect to HIV. Additionally, we examined convergent validity of the BENCI using tests from the Kilifi Toolkit. RESULTS: Internal consistencies (0.49 < α < 0.97) and test-retest reliabilities (-.34 to .81) were sufficient-to-good for most of the subtests. Convergent validity was supported by significant correlations between the BENCI's Verbal memory and Kilifi's Verbal List Learning (r = .41), the BENCI's Visual memory and Kilifi's Verbal List Learning (r = .32) and the BENCI's Planning total time test and Kilifi's Tower Test (r = -.21) and the BENCI's Abstract Reasoning test and Kilifi's Raven's Progressive Matrix (r = .21). The BENCI subtests highlighted meaningful differences between children living with HIV and those not living with HIV. After some minor adaptions, a confirmatory four-factor model consisting of flexibility, fluency, reasoning and working memory fitted well (χ2 = 135.57, DF = 51, N = 604, p < .001, RMSEA = .052, CFI = .944, TLI = .914) and was partially scalar invariant between HIV positive and negative groups. CONCLUSION: The English version of the BENCI formally translated for use in Kenya can be further adapted and integrated in clinical and research settings as a valid and reliable cognitive test battery.


Assuntos
Infecções por HIV , Humanos , Criança , Quênia , Infecções por HIV/complicações , Infecções por HIV/diagnóstico , Infecções por HIV/psicologia , Psicometria , Reprodutibilidade dos Testes , Estudos de Casos e Controles , Testes Neuropsicológicos , Inquéritos e Questionários
8.
R Soc Open Sci ; 10(2): 210586, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36756069

RESUMO

Increased execution of replication studies contributes to the effort to restore credibility of empirical research. However, a second generation of problems arises: the number of potential replication targets is at a serious mismatch with available resources. Given limited resources, replication target selection should be well-justified, systematic and transparently communicated. At present the discussion on what to consider when selecting a replication target is limited to theoretical discussion, self-reported justifications and a few formalized suggestions. In this Registered Report, we proposed a study involving the scientific community to create a list of considerations for consultation when selecting a replication target in psychology. We employed a modified Delphi approach. First, we constructed a preliminary list of considerations. Second, we surveyed psychologists who previously selected a replication target with regards to their considerations. Third, we incorporated the results into the preliminary list of considerations and sent the updated list to a group of individuals knowledgeable about concerns regarding replication target selection. Over the course of several rounds, we established consensus regarding what to consider when selecting a replication target. The resulting checklist can be used for transparently communicating the rationale for selecting studies for replication.

9.
Psychon Bull Rev ; 30(4): 1609-1620, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-36635588

RESUMO

Employing two vignette studies, we examined how psychology researchers interpret the results of a set of four experiments that all test a given theory. In both studies, we found that participants' belief in the theory increased with the number of statistically significant results, and that the result of a direct replication had a stronger effect on belief in the theory than the result of a conceptual replication. In Study 2, we additionally found that participants' belief in the theory was lower when they assumed the presence of p-hacking, but that belief in the theory did not differ between preregistered and non-preregistered replication studies. In analyses of individual participant data from both studies, we examined the heuristics academics use to interpret the results of four experiments. Only a small proportion (Study 1: 1.6%; Study 2: 2.2%) of participants used the normative method of Bayesian inference, whereas many of the participants' responses were in line with generally dismissed and problematic vote-counting approaches. Our studies demonstrate that many psychology researchers overestimate the evidence in favor of a theory if one or more results from a set of replication studies are statistically significant, highlighting the need for better statistical education.


Assuntos
Heurística , Política , Humanos , Teorema de Bayes , Psicologia
10.
Psychosom Med ; 85(2): 188-202, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36640440

RESUMO

OBJECTIVE: Type D personality, a joint tendency toward negative affectivity and social inhibition, has been linked to adverse events in patients with heart disease, although with inconsistent findings. Here, we apply an individual patient-data meta-analysis to data from 19 prospective cohort studies ( N = 11,151) to investigate the prediction of adverse outcomes by type D personality in patients with acquired cardiovascular disease. METHOD: For each outcome (all-cause mortality, cardiac mortality, myocardial infarction, coronary artery bypass grafting, percutaneous coronary intervention, major adverse cardiac event, any adverse event), we estimated type D's prognostic influence and the moderation by age, sex, and disease type. RESULTS: In patients with cardiovascular disease, evidence for a type D effect in terms of the Bayes factor (BF) was strong for major adverse cardiac event (BF = 42.5; odds ratio [OR] = 1.14) and any adverse event (BF = 129.4; OR = 1.15). Evidence for the null hypothesis was found for all-cause mortality (BF = 45.9; OR = 1.03), cardiac mortality (BF = 23.7; OR = 0.99), and myocardial infarction (BF = 16.9; OR = 1.12), suggesting that type D had no effect on these outcomes. This evidence was similar in the subset of patients with coronary artery disease (CAD), but inconclusive for patients with heart failure (HF). Positive effects were found for negative affectivity on cardiac and all-cause mortality, with the latter being more pronounced in male than female patients. CONCLUSION: Across 19 prospective cohort studies, type D predicts adverse events in patients with CAD, whereas evidence in patients with HF was inconclusive. In both patients with CAD and HF, we found evidence for a null effect of type D on cardiac and all-cause mortality.


Assuntos
Doenças Cardiovasculares , Doença da Artéria Coronariana , Infarto do Miocárdio , Intervenção Coronária Percutânea , Personalidade Tipo D , Humanos , Masculino , Feminino , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/etiologia , Estudos Prospectivos , Teorema de Bayes , Doença da Artéria Coronariana/etiologia , Infarto do Miocárdio/epidemiologia , Infarto do Miocárdio/etiologia , Fatores de Risco , Resultado do Tratamento
12.
F1000Res ; 11: 471, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36128558

RESUMO

Background: Traditionally, research integrity studies have focused on research misbehaviors and their explanations. Over time, attention has shifted towards preventing questionable research practices and promoting responsible ones. However, data on the prevalence of responsible research practices, especially open methods, open codes and open data and their underlying associative factors, remains scarce. Methods: We conducted a web-based anonymized questionnaire, targeting all academic researchers working at or affiliated to a university or university medical center in The Netherlands, to investigate the prevalence and potential explanatory factors of 11 responsible research practices. Results: A total of 6,813 academics completed the survey, the results of which show that prevalence of responsible practices differs substantially across disciplines and ranks, with 99 percent avoiding plagiarism in their work but less than 50 percent pre-registering a research protocol. Arts and humanities scholars as well as PhD candidates and junior researchers engaged less often in responsible research practices. Publication pressure negatively affected responsible practices, while mentoring, scientific norms subscription and funding pressure stimulated them. Conclusions: Understanding the prevalence of responsible research practices across disciplines and ranks, as well as their associated explanatory factors, can help to systematically address disciplinary- and academic rank-specific obstacles, and thereby facilitate responsible conduct of research.


Assuntos
Ciências Humanas , Pesquisadores , Humanos , Países Baixos , Prevalência , Universidades
13.
PLoS One ; 17(2): e0263023, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35171921

RESUMO

Prevalence of research misconduct, questionable research practices (QRPs) and their associations with a range of explanatory factors has not been studied sufficiently among academic researchers. The National Survey on Research Integrity targeted all disciplinary fields and academic ranks in the Netherlands. It included questions about engagement in fabrication, falsification and 11 QRPs over the previous three years, and 12 explanatory factor scales. We ensured strict identity protection and used the randomized response method for questions on research misconduct. 6,813 respondents completed the survey. Prevalence of fabrication was 4.3% (95% CI: 2.9, 5.7) and of falsification 4.2% (95% CI: 2.8, 5.6). Prevalence of QRPs ranged from 0.6% (95% CI: 0.5, 0.9) to 17.5% (95% CI: 16.4, 18.7) with 51.3% (95% CI: 50.1, 52.5) of respondents engaging frequently in at least one QRP. Being a PhD candidate or junior researcher increased the odds of frequently engaging in at least one QRP, as did being male. Scientific norm subscription (odds ratio (OR) 0.79; 95% CI: 0.63, 1.00) and perceived likelihood of detection by reviewers (OR 0.62, 95% CI: 0.44, 0.88) were associated with engaging in less research misconduct. Publication pressure was associated with more often engaging in one or more QRPs frequently (OR 1.22, 95% CI: 1.14, 1.30). We found higher prevalence of misconduct than earlier surveys. Our results suggest that greater emphasis on scientific norm subscription, strengthening reviewers in their role as gatekeepers of research quality and curbing the "publish or perish" incentive system promotes research integrity.


Assuntos
Pesquisa Biomédica/ética , Ética em Pesquisa , Projetos de Pesquisa/normas , Pesquisadores/ética , Má Conduta Científica/ética , Má Conduta Científica/estatística & dados numéricos , Estudos Transversais , Feminino , Humanos , Masculino , Prevalência , Inquéritos e Questionários
14.
J Appl Psychol ; 107(11): 2013-2039, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34968082

RESUMO

Effect misestimations plague Psychological Science, but advances in the identification of dissemination biases in general and publication bias in particular have helped in dealing with biased effects in the literature. However, the application of publication bias detection methods appears to be not equally prevalent across subdisciplines. It has been suggested that particularly in I/O Psychology, appropriate publication bias detection methods are underused. In this meta-meta-analysis, we present prevalence estimates, predictors, and time trends of publication bias in 128 meta-analyses that were published in the Journal of Applied Psychology (7,263 effect sizes, 3,000,000 + participants). Moreover, we reanalyzed data of 87 meta-analyses and applied nine standard and more modern publication bias detection methods. We show that (a) the bias detection method applications are underused (only 41% of meta-analyses use at least one method) but have increased in recent years, (b) those meta-analyses that apply such methods now use more, but mostly inappropriate methods, and (c) the prevalence of potential publication bias is concerning but mostly remains undetected. Although our results indicate somewhat of a trend toward higher bias awareness, they substantiate concerns about potential publication bias in I/O Psychology, warranting increased researcher awareness about appropriate and state-of-the-art bias detection and triangulation. Embracing open science practices such as data sharing or study preregistration is needed to raise reproducibility and ultimately strengthen Psychological Science in general and I/O Psychology in particular. (PsycInfo Database Record (c) 2022 APA, all rights reserved).


Assuntos
Psicologia Industrial , Humanos , Viés de Publicação , Reprodutibilidade dos Testes , Prevalência , Viés
15.
Gen Hosp Psychiatry ; 71: 62-75, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33962138

RESUMO

INTRODUCTION: Type D personality, operationalized as high scores on negative affectivity (NA) and social inhibition (SI), has been associated with various medical and psychosocial outcomes. The recent failure to replicate several earlier findings could result from the various methods used to assess the Type D effect. Despite recommendations to analyze the continuous NA and SI scores, a popular approach groups people as having Type D personality or not. This method does not adequately detect a Type D effect as it is also sensitive to main effects of NA or SI only, suggesting the literature contains false positive Type D effects. Here, we systematically assess the extent of this problem. METHOD: We conducted a systematic review including 44 published studies assessing a Type D effect with both a continuous and dichotomous operationalization. RESULTS: The dichotomous method showed poor agreement with the continuous Type D effect. Of the 89 significant dichotomous method effects, 37 (41.6%) were Type D effects according to the continuous method. The remaining 52 (58.4%) are therefore likely not Type D effects based on the continuous method, as 42 (47.2%) were main effects of NA or SI only. CONCLUSION: Half of the published Type D effect according to the dichotomous method may be false positives, with only NA or SI driving the outcome.


Assuntos
Personalidade Tipo D , Humanos , Inibição Psicológica , Personalidade
16.
PLoS Biol ; 18(12): e3000937, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33296358

RESUMO

Researchers face many, often seemingly arbitrary, choices in formulating hypotheses, designing protocols, collecting data, analyzing data, and reporting results. Opportunistic use of "researcher degrees of freedom" aimed at obtaining statistical significance increases the likelihood of obtaining and publishing false-positive results and overestimated effect sizes. Preregistration is a mechanism for reducing such degrees of freedom by specifying designs and analysis plans before observing the research outcomes. The effectiveness of preregistration may depend, in part, on whether the process facilitates sufficiently specific articulation of such plans. In this preregistered study, we compared 2 formats of preregistration available on the OSF: Standard Pre-Data Collection Registration and Prereg Challenge Registration (now called "OSF Preregistration," http://osf.io/prereg/). The Prereg Challenge format was a "structured" workflow with detailed instructions and an independent review to confirm completeness; the "Standard" format was "unstructured" with minimal direct guidance to give researchers flexibility for what to prespecify. Results of comparing random samples of 53 preregistrations from each format indicate that the "structured" format restricted the opportunistic use of researcher degrees of freedom better (Cliff's Delta = 0.49) than the "unstructured" format, but neither eliminated all researcher degrees of freedom. We also observed very low concordance among coders about the number of hypotheses (14%), indicating that they are often not clearly stated. We conclude that effective preregistration is challenging, and registration formats that provide effective guidance may improve the quality of research.


Assuntos
Coleta de Dados/métodos , Projetos de Pesquisa/estatística & dados numéricos , Coleta de Dados/normas , Coleta de Dados/tendências , Humanos , Controle de Qualidade , Sistema de Registros/estatística & dados numéricos , Projetos de Pesquisa/tendências
17.
J Intell ; 8(4)2020 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-33023250

RESUMO

In this meta-study, we analyzed 2442 effect sizes from 131 meta-analyses in intelligence research, published from 1984 to 2014, to estimate the average effect size, median power, and evidence for bias. We found that the average effect size in intelligence research was a Pearson's correlation of 0.26, and the median sample size was 60. Furthermore, across primary studies, we found a median power of 11.9% to detect a small effect, 54.5% to detect a medium effect, and 93.9% to detect a large effect. We documented differences in average effect size and median estimated power between different types of intelligence studies (correlational studies, studies of group differences, experiments, toxicology, and behavior genetics). On average, across all meta-analyses (but not in every meta-analysis), we found evidence for small-study effects, potentially indicating publication bias and overestimated effects. We found no differences in small-study effects between different study types. We also found no convincing evidence for the decline effect, US effect, or citation bias across meta-analyses. We concluded that intelligence research does show signs of low power and publication bias, but that these problems seem less severe than in many other scientific fields.

18.
PLoS One ; 15(7): e0236079, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32735597

RESUMO

In this preregistered study, we investigated whether the statistical power of a study is higher when researchers are asked to make a formal power analysis before collecting data. We compared the sample size descriptions from two sources: (i) a sample of pre-registrations created according to the guidelines for the Center for Open Science Preregistration Challenge (PCRs) and a sample of institutional review board (IRB) proposals from Tilburg School of Behavior and Social Sciences, which both include a recommendation to do a formal power analysis, and (ii) a sample of pre-registrations created according to the guidelines for Open Science Framework Standard Pre-Data Collection Registrations (SPRs) in which no guidance on sample size planning is given. We found that PCRs and IRBs (72%) more often included sample size decisions based on power analyses than the SPRs (45%). However, this did not result in larger planned sample sizes. The determined sample size of the PCRs and IRB proposals (Md = 90.50) was not higher than the determined sample size of the SPRs (Md = 126.00; W = 3389.5, p = 0.936). Typically, power analyses in the registrations were conducted with G*power, assuming a medium effect size, α = .05 and a power of .80. Only 20% of the power analyses contained enough information to fully reproduce the results and only 62% of these power analyses pertained to the main hypothesis test in the pre-registration. Therefore, we see ample room for improvements in the quality of the registrations and we offer several recommendations to do so.


Assuntos
Comitês de Ética em Pesquisa , Tamanho da Amostra , Estatística como Assunto/métodos
19.
Psychol Bull ; 146(10): 922-940, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32700942

RESUMO

We examined the evidence for heterogeneity (of effect sizes) when only minor changes to sample population and settings were made between studies and explored the association between heterogeneity and average effect size in a sample of 68 meta-analyses from 13 preregistered multilab direct replication projects in social and cognitive psychology. Among the many examined effects, examples include the Stroop effect, the "verbal overshadowing" effect, and various priming effects such as "anchoring" effects. We found limited heterogeneity; 48/68 (71%) meta-analyses had nonsignificant heterogeneity, and most (49/68; 72%) were most likely to have zero to small heterogeneity. Power to detect small heterogeneity (as defined by Higgins, Thompson, Deeks, & Altman, 2003) was low for all projects (mean 43%), but good to excellent for medium and large heterogeneity. Our findings thus show little evidence of widespread heterogeneity in direct replication studies in social and cognitive psychology, suggesting that minor changes in sample population and settings are unlikely to affect research outcomes in these fields of psychology. We also found strong correlations between observed average effect sizes (standardized mean differences and log odds ratios) and heterogeneity in our sample. Our results suggest that heterogeneity and moderation of effects is unlikely for a 0 average true effect size, but increasingly likely for larger average true effect size. (PsycInfo Database Record (c) 2020 APA, all rights reserved).


Assuntos
Metanálise como Assunto , Psicologia/estatística & dados numéricos , Feminino , Humanos , Atividade Motora , Reprodutibilidade dos Testes , Teste de Stroop/estatística & dados numéricos
20.
PLoS One ; 15(5): e0233107, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32459806

RESUMO

To determine the reproducibility of psychological meta-analyses, we investigated whether we could reproduce 500 primary study effect sizes drawn from 33 published meta-analyses based on the information given in the meta-analyses, and whether recomputations of primary study effect sizes altered the overall results of the meta-analysis. Results showed that almost half (k = 224) of all sampled primary effect sizes could not be reproduced based on the reported information in the meta-analysis, mostly because of incomplete or missing information on how effect sizes from primary studies were selected and computed. Overall, this led to small discrepancies in the computation of mean effect sizes, confidence intervals and heterogeneity estimates in 13 out of 33 meta-analyses. We provide recommendations to improve transparency in the reporting of the entire meta-analytic process, including the use of preregistration, data and workflow sharing, and explicit coding practices.


Assuntos
Psicologia/métodos , Intervalos de Confiança , Metanálise como Assunto , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...